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(57) Abstract: ABSTRACT An image composition system includes an HMD (100) having a right-eye camera (1 10), a left-eye 
camera ( I I I ), a right-cyc LCD ( 1 30) and a left -eye LCD ( 1 3 1 ) for displaying a real image, and the like, and an information processing 
apparatus (300) for generating another image different from the real image. A composite image obtained by superimposing the 
other image generated by the information processing apparatus (300) on the real image captured by the right-eye camera (1 10) and 
left-eye camera (1 1 1) is displayed on the right-eye LCD (130) and left -eye LCD (131). The display region of the other image is 
determined based on the position and posture of the head of the user who wears the HMD (100). The user can observe the other 
image superimposed on the real image at an appropriate position while wearing the HMD on his or her head. 
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DESCRIPTION 

IMAGE COMPOSITION APPARATUS AND METHOD 

5 TECHNICAL FIELD 

The present invention relates to an image composition 
apparatus that displays a real image superimposed with 
another image on a display unit of display means to be worn 
. on a head. 

10 

BACKGROUND ART , 
Conventionally, upon shooting movie or television 
program scenes, a performer acts according to memorized 
script contents. After shooting one scene, a director 1 

15 gives directions about that scene to the performer, the 
performer confirms the directions while observing playback 
of a video of himself or herself, and shooting progresses 
while reflecting those directions in action. In such 
process, shooting is made. 

20 However, it is a heavy burden for a performer to 

memorize the script contents. Since the director gives 
directions after shooting one scene, the performer cannot 
receive fine directions from the director during action. 
Also, the performer cannot see a video of himself or herself , 

25 i.e., how he or she is acting, during shooting. 
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description, serve to explain the principles of the 
invention. 

Fig. 1 shows a use example of an image composition 
apparatus according to an embodiment of the present 
5 invention; 

Fig. 2 is a block diagram showing the use example of 
Fig. 1; 

Fig. 3A is a perspective view of an HMD (Head Mount 
Display) in Figs. 1 and 2 when viewed from the front side; 
10 Fig. 3B is a perspective view of the HMD (Head Mount 

Display) in Figs. 1 and 2 when viewed from the rear side^ 

Fig. 4 shows the generation processes of a video to 
be superimposed on the HMD; \ 

Fig. 5 is a diagram showing the configuration of 
15 programs which run on an information processing apparatus 
300 shown in Fig. 2; 

Fig. 6 is a flow chart showing the process of an HMD 
display thread 1000 shown in Fig. 5; 

Fig. 7 is a flow chart showing the process for 
20 determining a display position in step S103 in Fig. 6; 

Fig. 8 is a flow chart showing the process of a 
terminal management thread 2000 in Fig. 5; 

Fig. 9 is a flow chart showing the process of a script 
management thread 3000 in Fig. 5; and 
25 Fig. 10 is a state transition chart showing 

transition of the state of a gesture recognition thread 
4000. 
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combination of an optical see-through HMD and video camera 
in place of the video see-through HMD. 

The right-eye camera 110 of the HMD 100 is connected 
to a video capture card 320 of the information processing 
5 apparatus 300, and the left-eye camera 111 is connected to 
a video capture card 321 of the information processing 
apparatus 300. The right-eye LCD 130 is connected to a 
video card 330 of the information processing apparatus 300, 
and the left-eye LCD 131 is connected to a video card 331 
10 of the information processing apparatus 300. The LCDs 130 
and 131 display a composite video of those actually capturecj 
by the left-eye camera 111 and right-eye camera 110, and, 
for example, script data or a video from the video camera! 
400 (Fig. 4 ) . 

15 The video to be displayed on the HMD 100 is generated 

by the information processing apparatus 300. The 
information processing apparatus 300 comprises a CPU 301, 
memory 302, PCI bridge 303, hard disk I/F 340, hard disk 
350, and the like in addition to a serial I/O 310, the video 

20 capture cards 320, 321, and 322, and video cards 330 and 
331 mentioned above. 

The three-dimensional position sensor 200 comprises 
the three-dimensional position sensor fixed station 210 and 
the three-dimensional sensor mobile station 120 which is 

25 built in the HMD. The three-dimensional position sensor 
200 measures the relative position between the 
three-dimensional position sensor fixed station 210 and 
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This embodiment uses a magnetic position sensor which 
is separated into a fixed station and position sensor, but 
may use a position sensor using a gyro as long as it can 
measure the position of the HMD. The terminal 600 is used 
5 to input instructions from a staff member or to input 
shooting start and stop instructions. 

The configuration of programs which run on the 
information processing apparatus 300 will be explained 
below. In the following description, assume that a 

10 performer wears the HMD 100 in rehearsal upon shooting a 
movie or television program. f 
Fig. 5 shows the configuration of programs which run 
on the information processing apparatus 300 in Fig. 2. Thei 
programs include an HMD display thread 1000, terminal 

15 management thread 2000, script management thread 3000, and 
gesture recognition thread 4000. Data are exchanged among 
the threads via an instruction buffer 2001, script buffer 
3002, and display mode flag 4001. 

The HMD display thread 1000 displays videos captured 

20 by the right-eye camera 110 and left-eye camera 111 on the 
LCDs 130 and 131. In this case, the thread 1000 
superimposes an instruction from a staff member written in 
the instruction buffer 2001 or script data written in the 
script buffer 3002. Also, an image taken by the television 

25 camera 400 is also superimposed. 
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determination method of the information display position 
will be explained later. 

After the information display position is determined, 
a video from the video camera 400, script data 350, and an 
5 instruction from a staff member via the terminal 600 are 
captured (step S104), and the captured information is 
written in a video buffer area corresponding to the 
determined display position (step S105) . Since the video 
from the HMD 100 has already been written in the video buffer, 

10 information is superimposed on that video. 

Upon completion of rendering, the contents of the^ 
video buffer are transferred to a frame buffer on the video 
board 330 to display (render) the contents (video) of the^ 
video buffer on the LCD 130 in the HMD 100 (step S106) J 

15 The determination method of the information display 

position in step S103 will be described below. 

Fig. 7 is a flow chart of the process for determining 
the information display position in step S103 in Fig. 6. 
The position of the HMD 100 is acquired from the 

20 three-dimensional sensor main body 200 (step S200) . The 
information processing apparatus 300 generates and sets a 
modeling conversion matrix on the basis of the position 
acquired in step S200, the coordinate position of the HMD 
100, and parameters such as the focal length of the camera 

25 and the like, which are measured in advance, so as to obtain 
an image from the viewpoint of the HMD 100 (step S201) . Note 
that the "position and posture of the viewpoint of an 
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video camera 400 and information from the script data 350 
are determined to be the center of the four corners of the 
table/ and the video and the information are rendered, as 
indicated by a video 4c . The videos 4a and 4c are composited 
5 to obtain a video 4d, which is observed by the person who 
wears the HMD 100. Note that Fig. 4 typically illustrates 
the respective videos, and some of the video contents, 
composite positions , and the like are not accurate. 

In this embodiment, as described previously, since 

10 the information is superimposed on the table, the 

coordinate position of which is known, the field of viey 
of the performer can be prevented from being intercepted. 
Display of information on the table does not limit the \ 
present invention. For example, information may be 1 

15 superimposed on a portion of a wall, the coordinate position 
of which is known . Furthermore, the position of the display 
can be dynamically changed. For example, the position of 
the display may be changed from a desk to a wall. 

The terminal management thread 2000 in Fig. 5 mainly 

20 processes an input from the terminal 600, and writes an 
instruction from a staff member to the performer via the 
terminal 600 in the instruction buffer 2001. At the same 
time, the terminal management thread 2000 informs the 
script management thread 3000 of staff member 1 s operations . 

25 The process of the terminal management thread 2000 

will be described below. 
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The script data is stored as a sequence of sets of time stamps 
and character strings to be displayed at those timings . 

The process of the script management thread 3000 will 
be described below. 
5 Fig. 9 is 'a flow chart showing the process of the 

script management thread 3000 in Fig. 5. 

As an initial setup process, an internal shooting 
clock 3001 is reset to zero, and a pointer of script data 
is returned to the head of the script (step S400) . 

10 Data (next script data to be displayed) pointed by 

the pointer of the script data is loaded from the hard disk^ 
350 (step S4 01) . The control waits until the shooting clock 
3001 becomes the same as the time stamp of the script data \ 
pointed by the pointer (step S402) . 

15 The script data pointed by the pointer is written in 

the script buffer (step S403) . The pointer of the script 
data is advanced (step S404) , and the flow returns to step 
S4 01 to repeat the aforementioned steps. 

The gesture recognition thread 4000 in Fig. 5 

20 recognizes a gesture (hand position and posture) of the 
performer on the basis of the position of the hand position 
sensor 220 obtained via the three-dimensional position 
sensor main body 200. Every time a gesture is recognized, 
the display mode flag 4001 is turned on/off. 

25 In this embodiment, as a gesture for turning on/off 

display, an action for moving the hand up and down three 
times for a second is selected. 
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When a downward acceleration is detected in the 
"upward acceleration stop state S502", the gesture 
recognition thread 4000 transits to a "downward 
acceleration state S503". When the acceleration has 
5 stopped, the gesture recognition thread 4000 transits to 
a "downward acceleration stop state S504" . When a downward 
acceleration is detected again within 0.1 sec after 
transition, the gesture recognition thread 4 000 returns to 
.the "downward acceleration state S503". 

10 When an upward acceleration is detected in the 

"downward acceleration stop state S504", the gesture 
recognition thread 4000 transits to the "upward 
acceleration state S501". At this time, the internal 
counter is incremented. This corresponds to a case wherein 5 

15 the hand is moved downward after upward movement. 

When no acceleration is detected within 0 . 1 sec after 
transition to the "downward acceleration stop state S504", 
the gesture recognition thread 4000 transits to the 
"standby state S500". If counter = 3, it is determined 

20 that the gesture is complete, and an event is generated to 
invert the value ( TRUE/ FALSE ) of the display mode flag 4 001 . 

The gesture recognition thread 4000 executes the 
process according to the aforementioned state transition 
chart to detect an event. In the above description, a 

25 full-superimpose display ON/OFF instruction is issued by 
a gesture. However, for example, display of script data, 
instruction data, and video camera image may be 
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In this case, the program code itself read out from 
the storage medium implements the functions of the 
above-mentioned embodiments, and the. storage medium which 
stores the program code constitutes the present invention. 
5 As the storage medium for supplying the program code, 

for example, a floppy disk, hard disk, optical disk, 
magneto-optical disk, CD-ROM, CD-R, magnetic tape, 
nonvolatile memory card, ROM, and the like may be used. 

The functions of the above-mentioned embodiments may 

10 be implemented not only by executing the readout program 
code by the computer but also by some or all of actual 
processing operations executed by an OS (operating system) 
running on the computer on the basis of an instruction ofi 
the program code . 

15 Furthermore, the functions of the above-mentioned 

embodiments may be implemented by some or all of actual 
processing operations executed by a CPU or the like arranged 
in a function extension board or a function extension unit, 
which is inserted in or connected to the computer, after 

20 the program code read out from the storage medium is written 
in a memory of the extension board or unit. 

As described in detail above, according to the image 
composition apparatus, since another image is displayed on 
a display unit that displays a real image, the other image 

25 can be superimposed on the real image. Hence, the user who 
wears display means on the head can observe the other image 
superimposed on the real image. 
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CLAIMS 

1. An image composition system for compositing a real 
image in a line-of-sight direction of a user with another 
image, comprising: 
5 a display unit which is wearable on a head of the user, 

and displays a composite image; 

a position sensor for detecting the line-of -sight 
direction of the user, and outputting line-of-sight 
information ; 

10 a determination unit for determining a display region 

where the other image is to be displayed, in accordance wit£ 

the line-of-sight information; and 

a composition unit for compositing the other imag^ 

on the determined display region, 
15 wherein the other image is used to display 

information that helps operations of the user. 

2 . The system according to claim 1, wherein said display 

unit has an optical see-through structure, and the user can 

observe a real space via said display unit. 
20 3. . The system according to claim 1, further comprising : 
a first image taking device for obtaining a video of 

a real space observed from a viewpoint of the user, and 
wherein said composition unit displays the video 

obtained by said first image taking device on said display 
25 unit, and superimposes the other image on the display region 

determined by said determination unit. 
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9. The system according to claim 8, wherein said 
composition unit switches contents of the other image to 
be displayed on the display region in response to a 
predetermined action detected by said gesture detection 

5 unit. 

10. The system according to claim 1, wherein the 
information that helps the operations of the user is dialog 
information . 

11. The system according to claim 1, wherein the 

10 information that helps the operations of the user is an 
image obtained by taking an image of an action of the user. # 

12. An information processing method of displaying a 
composite image of a real image in a line-of -sight direction^ 
of a user and another image on a display unit which is 

15 wearable on a head of the user, comprising the steps of: 
detecting the line-of-sight direction of the user to 
acguire line-of-sight information; 

determining a display region where the other image 
is to be displayed, in accordance with the line-of-sight 
20 information; and 

compositing the other image on the determined display 
region, 

wherein the other image is used to display 
information that helps operations of the user. 
25 13. The method according to claim 12, wherein the other 
image information is a video obtained by image taking means 
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